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Abstract 

In the paper very efficient, linear in number of arcs, algorithms for determining Hum- 
mon and Doreian's arc weights SPLC and SPNP in citation network are proposed, and 
some theoretical properties of these weights are presented. The nonacyclicity problem in 
citation networks is discussed. An approach to identify on the basis of arc weights an im- 
portant small subnetwork is proposed and illustrated on the citation networks of SOM (self 
organizing maps) literature and US patents. 

Keywords: large network, acyclic, citation network, main path, CPM path, arc weight, 
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1 Introduction 

The citation network analysis started with the paper of Garfield et al. (1964) [10] in which the 
introduction of the notion of citation network is attributed to Gordon Allen. In this paper, on the 
example of Asimov's history of DNA [1], it was shown that the analysis "demonstrated a high 
degree of coincidence between an historian 's account of events and the citational relationship 
between these events". An early overview of possible applications of graph theory in citation 
network analysis was made in 1965 by Garner [13]. 

The next important step was made by Hummon and Doreian (1989) [14, 15, 16]. They 
proposed three indices (NPPC, SPLC, SPNP) - weights of arcs that provide us with automatic 
way to identify the (most) important part of the citation network - the main path analysis. 

In this paper we make a step further. We show how to efficiently compute the Hummon 
and Doreian's weights, so that they can be used also for analysis of very large citation networks 
with several thousands of vertices. Besides this some theoretical properties of the Hummon and 
Doreian's weights are presented. 

The proposed methods are implemented in Pa jek - a program, for Windows (32 bit), for 
analysis of large networks. It is freely available, for noncommercial use, at its homepage [4]. 

For basic notions of graph theory see Wilson and Watkins [18]. 
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Table 1: Citation network characteristics 



network 


n 


m 


m 


n 


nc 


kc 


h 






2 


3 


4 


DNA 


40 


60 





1 


35 


3 


11 


7 


5 











Coupling 


223 


657 


1 


5 


218 


1 


16 


19 


134 











Small world 


396 


1988 





163 


233 


1 


16 


60 


294 











Small & Griffith 


1059 


4922 


1 


35 


1024 




28 


89 


232 


2 








Cocitation 


1059 


4929 


1 


35 


1024 




28 


90 


232 


2 








Scientometrics 


3084 


10416 


1 


355 


2678 


21 


32 


121 


105 


5 


2 


1 


Kroto 


3244 


31950 


1 





3244 




32 


166 


3243 


6 








SOM 


4470 


12731 


2 


698 


3704 


27 


24 


51 


735 


11 








Zewail 


6752 


54253 


1 


101 


6640 


5 


75 


166 


227 


38 


1 


2 


Lederberg 


8843 


41609 


7 


519 


8212 


35 


63 


135 


1098 


54 


4 





Desalination 


8851 


25751 


7 


1411 


7143 


115 


27 


73 


137 


12 





1 


US patents 


3774768 


16522438 


1 





3764117 


3627 


32 


779 


770 












2 Citation Networks 

In a given set of units U (articles, books, works, . . . ) we introduce a citing relation iJCUxU 

uRv = v cites u 

which determines a citation network N = (U, R). 

In Table 1 some characteristics of real life citation networks are presented. Most of these 
networks were obtained from the Eugene Garfield's collection of citation data [10, 12] produced 
using HistCite Software (formerly called HistComp - compiled //z'storiography program) [11]. 
All of these networks are the result of searches in the Web of Science and are used with the 
permission of ISI of Philadelphia, www. isinet . com. These networks in Pa jek's format 
are available from Pa jek's web site [19]. 

In Table 1: n — |U| is the number of vertices; m = \R\ is the number of arcs; m is the 
number of loops; n is the number of isolated vertices; n c is the size of the largest weakly 
connected component; k c is the number of nontrivial weakly connected components; h is the 
depth of network (minimum number of levels); A in is the maximum input degree; and A out is 
the maximum output degree. The last three columns contain the numbers of strongly connected 
components (cyclic parts) of size 2, 3 and 4. 

A citing relation is usually irreflexive, Vu G U : -^uRu, and (almost) acyclic - no vertex 
is reachable from itself by a nontrivial path, or formally Vw G UVA: G IN + : ~^uR k u. In the 
following we shall assume that it has this property. We shall postpone the question how to deal 
with nonacyclic citation networks till the end of the theoretical part of the paper. 

For a relation QCTJxUwe denote by Q mv its inverse relation, uQ mv v = vQu, and by 

Q(u) = {v G U : uQv} 

the set of successors of unit u G U. If Q is acyclic then also Q mv is acyclic. This means that 
the network N inv = (U, R mv ), uR mv v = u cites v, is a network of the same type as the original 
citation network N = (U, R). Therefore it is just a matter of 'taste' which relation to select. 
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Figure 1 : Citation Network in Standard Form 



Let I = {(u,u) : u E U} be the identity relation on U and Q = \J ^+ Q k the transitive 

closure of relation Q. Then Q is acyclic iff Q PI I = 0. The relation Q* = Q U I is the transitive 
and reflexive closure of relation Q. 

Since the set of units U is finite and R is acyclic we know from the theory of relations that: 

• The set of units U can be topologically ordered - there exists a surjective mapping (per- 
mutation) i : U — > l..|U| with the property 

uRv =>- i(u) < i(v) 

• Let Min R — {u £ U : R mv (u) = 0} be the set of minimal elements and Max R = {u 6 
U : R(u) = 0} the set of maximal elements. Then Min R ^ and Max R ^ 0. 

• Every unit it e U and every arc (w, t> ) £ i? belong to at least one path from Min R to 

Max R: 

Vn £ U : nMaxi? ^ 

Vn £ U : i? inv *(?z) HMini?^ 

To simplify the presentation we transform a citation network N = (U, R) to its standard 
form N' = (U', R') (see Figure 1) by extending the set of units U' := U U {s, t}, s, t ^ U 
with a common source (initial unit) s and a common sini (terminal unit) t, and by adding the 
corresponding arcs to relation R 

R' := R U {s} x Mini? U Maxi? x {t} U {(t, s)} 

This eliminates problems with networks with several connected components and/or several ini- 
tial/terminal units. In the following we shall assume that the citation network N = (U, R) is in 
the standard form. Note that, to make the theory smoother, we added to R' also the 'feedback' 
arc (t, s), thus destroying its acyclicity. 
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3 Analysis of Citation Networks 



An approach to the analysis of citation network is to determine for each unit / arc its impor- 
tance or weight. These values are used afterward to determine the essential substructures in the 
network. In this paper we shall focus on the methods of assigning weights w : R — * IRq to arcs 
proposed by Hummon and Doreian [14, 15]: 

• node pair projection count (NPPC) method: Wd(u,v) = |-R inv *(w)| • |-R*(f)| 

• search path link count (SPLC) method: wi(u,v) equals the number of "all possible search 
paths through the network emanating from an origin node" through the arc (w, v) G R, 



• search path node pair (SPNP) method: w p (u, v) "accounts for all connected vertex pairs 
along the paths through the arc (u, v) G R", [14, p. 51]. 

3.1 Computing NPPC weights 

To compute w d for sets of units of moderate size (up to some thousands of units) the matrix 
representation of R can be used and its transitive closure computed by Roy-Warshall's algorithm 
[9]. The quantities \R*(v )| and |.R mv *(w)| can be obtained from closure matrix as row/column 
sums. An 0(nm) algorithm for computing w d can be constructed using Breath First Search 
from each u G U to determine |i? mv *(w)| and \R*(v)\. Since it is of order at least 0(n 2 ) this 
algorithm is not suitable for larger networks (several ten thousands of vertices). 

3.2 Search path count method 

To compute the SPLC and SPNP weights we introduce a related search path count (SPC) 
method for which the weights N(u, v), uRv count the number of different paths from s to t 
(or from Min R to Max R) through the arc (u, v). 

To compute N(u, v) we introduce two auxiliary quantities: let N~(v) denotes the number 
of different s-v paths, and N + (v) denotes the number of different v-t paths. 

Every s-t path it containing the arc (w, v ) G R can be uniquely expressed in the form 



where a is a s-u path and r is a v-t path. Since every pair (a, t) of s-u I v-t paths gives a 
corresponding s-t path it follows: 



[14, p. 50]. 



7r = a o (u, v) o r 



N(u,v) = N-^-N+iv), 



(u,v) G R 



where 




u — s 



otherwise 



and 




u = t 
otherwise 
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This is the basis of an efficient algorithm for computing the weights N(u,v) - after the topo- 
logical sort of the network [9] we can compute, using the above relations in topological order, 
the weights in time of order 0(m). The topological order ensures that all the quantities in the 
right side expressions of the above equalities are already computed when needed. The counters 
N(u, v) are used as SPC weights w c (u, v) = N(u, v). 



3.3 Computing SPLC and SPNP weights 

The description of SPLC method in [14] is not very precise. Analyzing the table of SPLC 
weights from [14, p. 50] we see that we have to consider each vertex as an origin of search 
paths. This is equivalent to apply the SPC method on the extended network N/ = (U', R{) 

Ri := R' U {s} x (V\UR(s)) 

It seems that there are some errors in the table of SPNP weights in [14, p. 51]. Using the 
definition of the SPNP weights we can again reduce their computation to SPC method applied 
on the extended network N p = (U', R p ) 

R p := R U {s} x U U U x {t} U {(t, s)} 

in which every unit u e U is additionaly linked from the source s and to the sink t. 



3.4 Computing the numbers of paths of length k 

We could use also a direct approach to determine the weights w p . Let L~(u) be the number of 
different paths terminating in u and L + (u) the number of different paths originating in u. Then 
for uRv it holds w p (u, v) = L~{u) ■ L + (v). 

The procedure to determine L~(u) and L + (u) can be compactly described using two fami- 
lies of polynomial generating functions 

h(u) h~ (u) 

P~ (u; x) = ^2 p~ (u, k)x k and P + (u;x)= ^ p + (u,k)x k , m£U 
fc=0 fc=o 

where h(u) is the depth of vertex u in network (U, R), and h~(u) is the depth of vertex u in 
network (U, R mv ), The coefficient p~~(u, k) counts the number of paths of length k to u, and 
p + (u, k) counts the number of paths of length k from u. 
Again, by the basic principles of combinatorics 



P ~ {U ' ,X) -{l+X-E v .,RuP-(v,x) 



u = s 

otherwise 



and 

u = t 



P+{U]X) ~{l+X-T. V :uR V P + {v ] x) 



otherwise 



and both families can be determined using the definitions and computing the polynomials in 
the (reverse for P + ) topological ordering of U. The complexity of this procedure is at most 
O(hm). Finally 

L~(u) = P"(u;l) and L + (v) = P + (v; 1) 
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In real life citation networks the depth h is relatively small as can be seen from the Table 1 . 

The complexity of this approach is higher than the complexity of the method proposed in 
subsection 3.3 - but we get more detailed information about paths. May be it would make sense 
to consider 'aging' of references by L~{u) = P~(u; a), for selected a, < a < 1. 

3.5 Vertex weights 

The quantities used to compute the arc weights w can be used also to define the corresponding 
vertex weights t 

t d (u) = \R™*(u)\-\R*(u)\ 

t c ( u ) = N-(u)-N + (u) 

t,(u) = N'-(u) -N' + {u) 

t p (u) = L'{u)-L + {u) 

They are counting the number of paths of selected type through the vertex u. 

3.6 Implementation details 

In our first implementation of the SPNP method the values of L~{u) and L + {u) for some 
large networks (Zewail and Lederberg) exceeded the range of Delphi's Largelnt (20 decimal 
places). We decided to use the Extended real numbers (range = 3.6 x 1CT 4951 .. 1.1 x 10 4932 , 
19-20 significant digits) for counters. This range is safe also for very large citation networks. 

To see this, let us denote N*{k) = max u:?i ( u ) =fc N~(u). Note that h(s) = and uRv =>- 
h(u) < h(v). Let u* G U be a unit on which the maximum is attained N*(k) = N~(u*). Then 

N*(k) = £ N~(v)< £ N*(h(v))< £ N*(k-l) = 

v.vRu* v.vRu* v.vRu* 

= deg in (u*) ■ N*(k - 1) < A in (k) ■ N*(k - 1) 

where A in (k) is the maximal input degree at depth k. Therefore N*(h) < Il/Li < Af n . 

A similar inequality holds also for N + (u). From both it follows 

N(u,v) <A^-A h J v) <A H -' 

where H = h(t) and A = max(A in , A out ). Therefore for H < 1000 and A < 10000 we get 
N(u,v) < A 11 ^ 1 < 10 4000 which is still in the range of Extended reals. Note also that in 
the derivation of this inequality we were very generous - in real-life networks N(u, v) will be 
much smaller than A^ -1 . 

Very large/small numbers that result as weights in large networks are not easy to use. One 
possibility to overcome this problem is to use the logarithms of the obtained weights - logarith- 
mic transformation is monotone and therefore preserve the ordering of weights (importance of 
vertices and arcs). The transformed values are also more convenient for visualization with line 
thickness of arcs. 
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4 Properties of weights 

4.1 General properties of weights 

Directly from the definitions of weights we get 

Wk(u,v;R) = Wk(v,u;R mv ), k = d,c,p 

and 

w c (u,v) < wi(u,v) < w p (u,v) 

Let N A = (Ua, Ra) and N B = (U B , Rb), Ua n U B = be two citation networks, and 
Ni = (U A , R' A ) and N 2 = ((U^UUb)', (RaURb)') the corresponding standardized networks 
of the first network and of the union of both networks. Then it holds for all u,v E Ua and for 
all p, q G Ra 

'!."(») _ '*'(") md <4"(p) _ -I'M 

£>(»)-«?>(»)• »?><») "«?<«)• "' P 

where and w^ 1 ) is a weight on network N l5 and t^ 2 ^ and w^ 2 ^ is a weight on network N 2 . This 
means that adding or removing components in a network do not change the ratios (ordering) of 
the weights inside components. 

Let Ni = (U, R±) and N 2 = (U, R 2 ) be two citation networks over the same set of units 
U and R l C R 2 then 

w k (u,v;R 1 )<w k (u,v;R 2 ), k = d,c,p 

4.2 NPPC weights 

In an acyclic network for every arc (u,v) G R hold 

R inv *(u) n R*(v) = and R mv \u) U R*(v) C U 
therefore («)| + < n and, using the inequality \fah < \{a + b), also 

w d (u,v) = \R™\u)\-\R*(v)\<^n 2 

Close to the source or sink the weights Wd are small, since the sets R*(u) (and i? inv *(w)) are 
monotonic along the paths in a sense 

uRv =>• i? r (u) C R*(v) 

The weights w d are larger in the 'middle' of the network. 

A more uniform (but less sensitive) weight would be w s (u, v) = |-R 1i1v *(m)| + \R*(v ) \ or in 
the normalized form w' s (u, v) = -w s (u, v). 
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4.3 SPC weights 

For the flow N(u, v) the Kirchoff's node law holds: 

For every node v in a citation network in standard form it holds 

incoming flow = outgoing flow = t c (v) 

Proof: 

]T N(x,v) = N ~( x ) ■ N+ ( v ) = ( S N ~( x )) ■ N+ ( v ) = N ~( v ) ■ N+ ( v ) 

x:xRv x:xRv x:xRv 

£ N(v,y)= ]T N-(v)-N + {y) = N-(v)- £ iV + (y) = N~(v) ■ N + (v) 

y.vRy y-vRy y-vRy 

□ 

From the Kirchoff's node law it follows that the total flow through the citation network 
equals N(t, s). This gives us a natural way to normalize the weights 

. N(u,v) 
w{u, v) = N ^ < w(u, v) < 1 

If C is a minimal arc-cut-set 

J2 w(u,v) = l 

(u,v)ec 

Let K n = {(u,v) : u,v E L.nAu < v} be the complete acyclic directed graph on n vertices 
then the value of N(u, v; K n ) is maximum over all citation networks on n units. It is easy to 
verify that 

N(l,n; K n ) = 2 n ~ 2 

and in general 

N(i,j;K n )=2?- i - 1 ,i< j 

From this result we see that the exhaustive search algorithm proposed in Hummon and Doreian 
[14, 15] can require exponential time to compute the arc weights w. 



5 Nonacyclic citation networks 

The problem with cycles is that if there is a cycle in a network then there is also an infinite 
number of trails between some units. There are some standard approaches to overcome the 
problem: 

• to introduce some 'aging' factor which makes the total weight of all trails converge to 
some finite value; 

• to restrict the definition of a weight to some finite subset of trails - for example paths or 
geodesies. 
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Figure 2: Preprint transformation 



But, new problems arise: What is the right value of the 'aging' factor? Is there an efficient 
algorithm to count the restricted trails? 

The other possibility, since a citation network is usually almost acyclic, is to transform it 
into an acyclic network 

• by identification (shrinking) of cyclic groups (nontrivial strong components), or 

• by deleting some arcs, or 

• by transformations such as the 'preprint' transformation (see Figure 2) which is based on 
the following idea: Each paper from a strong component is duplicated with its 'preprint' 
version. The papers inside strong component cite preprints. 

Large strong components in citation network are unlikely - their presence usually indicates 
an error in the data. An exception from this rule is the citation network of High Energy Particle 
Physics literature [20] from arXiv. In it different versions of the same paper are treated as a 
unit. This leads to large strongly connected components. The idea of preprint transformation 
can be used also in this case to eliminate cycles. 

6 First Example: SOM citation network 

The purpose of this example is not the analysis of the selected citation network on SOM (self- 
organizing maps) literature [12, 24, 23], but to present typical steps and results in citation net- 
work analysis. We made our analysis using program Pa jek. 

First we test the network for acyclicity. Since in the SOM network there are 1 1 nontrivial 
strong components of size 2, see Table 1, we have to transform the network into acyclic one. We 
decided to do this by shrinking each component into a single vertex. This operation produces 
some loops that should be removed. 
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JPOGGIO-T-1975-V19-P201 

(|)kOHONEN-T-1976-V21-P85 
(3 KOHONEN-T-1976-V22-P159 
(^)kOHONEN-T-1977-V2-P1065 
(") COOPER-LN-1 979-V33-P9 
Cj) BIENENSTOCK-EL-1 982-V2-P32 
(^) ANDERSON-JA-1 983-V13-P799 
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KNAPP-AG-1 984-V1 0-P61 6 
MCCLELLAND-JL-1985-V114-P159 



(9) CARPENTER-GA-1 987-V37-P54 
(^)hECHTNIELSEN-R-1987-V26-P1892 
(^) HECHTNIELSEN-R-1987-V26-P4979 
(9) HECHTNIELSEN-R-1988-V1 -P131 

KOHONEN-T-1990-V78-P1464 
(^) BAUER-HU-1992-V3-P570 



9 



LI-X-1993-V70-P189 



(9) GASTEIGER-J-1 994-V33-PB43 
(T^) GASTEIGER-J-1 994-V1 1 6-P4608 
BAUKNECHT-H-1 996-V3B-P1 205 




ROCHE-O-2002-V3-P455 



POGGIO-T-1 975-V1 9-P201 
(^) KOHONEN-T-1 976-V21 -P85 

9 

9 

9 

ANDERSON-JA-1983-V13-P799 
(9) KNAPP-AG-1 984-V10-P61 6 

9 

9 



KOHONEN-T-1 976-V22-P1 59 
KOHONEN-T-1977-V2-P1065 
COOPER-LN-1 979-V33-P9 
BIENENSTOCK-EL-1982-V2-P32 



MCCLELLAND-JL-1 985-V1 1 4-P1 59 
CARPENTER-GA-1 987-V37-P54 



GROSSBERQ-S-1 987-V1 1 -P23 
(^) GROSSBERG-S-1 988-V1 -P1 7 
(^) SEJNOWSKI-TJ-1 988-V241 -P1 299 
(J BROWN-TH-1 988-V242-P724 
(^)bROWN-TH-1990-V13-P475 
Cj) TREVES-A-1 991 -V2-P371 
CpHASSELMO-ME-1993-V16-P218 
Cj)HASSELMO-ME-1994-V14-P3898 
Cj) BARKAI-E-1 994-V72-P659 
\) HASSELMO-ME-1995-V67-P1 
() HASSELMO-ME-1 995-V1 5-P5249 

MYERS-CE-1996-V66-P51 
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GLUCK-MA-1997-V48-P481 
ASHBY-FG-1999-V6-P363 



Figure 3: Main path and CPM path in SOM network with SPC weights 



Now, we can compute the citation weights. We selected the SPC (search path count) method. 
It returns the following results: the network with citation weights on arcs, the main path network 
and the vector with vertex weights. 

In a citation network, a main path (sub)network is constructed starting from the source 
vertex and selecting at each step in the end vertex/vertices the arc(s) with the highest weight, 
until a sink vertex is reached. 

Another possibility is to apply on the network N = (U, R, w) the critical path method 
(CPM) from operations research. 

First we draw the main path network. The arc weights are represented by the thickness of 
arcs. To produce a nice picture of it we apply the Pajek's macro Layers which contains a 
sequence of operations for determining a layered layout of an acyclic network (used also in 
analysis of genealogies represented by p-graphs). Some experiments with settings of different 
options are needed to obtain a right picture, see left part of Figure 3. In its right part the CPM 
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Table 2: 15 Hubs and Authorities 



Rank 


h 


Hub Id 


a 


Authority Id 


1 


0.06442 


CL ARK- JW- 1991-V36-P1259 


0.85214 


HOPFIELD-JJ-1982-V79-P2554 


2 


0.06366 


#GARDNER-E- 1 98 8- V2 1 -P257 


0.33427 


KOHONEN-T- 1 982- V43-P59 


3 


0.05794 


HUANG-SH- 1 994- V 1 7-P2 1 2 


0.14531 


KOHONEN-T- 1 990- V7 8-P 1464 


4 


0.05721 


GULATI-S-1991-V33-P173 


0.12398 


C ARPENTER-G A- 1987-V37-P54 


5 


0.05513 


SHUBNIKOV-EI-1997-V64-P989 


0.10376 


#GARDNER-E- 1 98 8- V2 1 -P257 


6 


0.05496 


MARSHALL-JA-1995-V8-P335 


0.09353 


HOPFIELD-JJ-1986-V233-P625 


7 


0.05488 


VEMURI-V-1993-V36-P203 


0.07882 


MCELIECE-RJ- 1 987- V33-P46 1 


8 


0.05409 


CHENG-B-1994-V9-P2 


0.07656 


KOHONEN-T- 198 8- VI -P3 


9 


0.05360 


BUSCEMA-M-1998-V33-P17 


0.07372 


RUMELHART-DE-1985-V9-P75 


10 


0.05258 


XU-L-1993-V6-P627 


0.07271 


KOSKO-B-1988-V18-P49 


11 


0.05249 


WELLS-DM-1998-V41-P173 


0.07246 


ANDERSON-JA-1977-V84-P413 


12 


0.05233 


SCHYNS-PG-1991-V15-P461 


0.07033 


AMARI-SI-1977-V26-P175 


13 


0.05173 


SMITH-KA-1999-V11-P15 


0.06709 


KOSKO-B-1987-V26-P4947 


14 


0.05149 


BONABEAU-E-1998-V9-P1107 


0.05802 


PERSONNAZ-L-1985-V46-PL359 


15 


0.05126 


KOHONEN-T- 1 990- V7 8-P 1464 


0.05702 


GROSSBERG-S-1987-V11-P23 



path is presented. 

We see that the upper parts of both paths are identical, but they differ in the continuation. 
The arcs in the CPM path are thicker. 

We could display also the complete SOM network using essentially the same procedure as 
for the displaying of main path. But the obtained picture would be too complicated (too many 
vertices and arcs). We have to identify some simpler and important subnetworks inside it. 

Inspecting the distribution of values of weights on arcs (lines) we select a threshold 0.007 
and determine the corresponding arc-cut - delete all arcs with weights lower than selected 
threshold and afterwards delete also all isolated vertices (degree = 0). 

Now, we are ready to draw the reduced network. We first produce an automatic layout. We 
notice some small unimportant components. We preserve only the large main component, draw 
it and improve the obtained layout manually. To preserve the level structure we use the option 
that allows only the horizontal movement of vertices. 

Finally we label the 'most important vertices' with their labels. A vertex is considered 
important if it is an endpoint of an arc with the weight above the selected threshold (in our case 
0.05). 

The obtained picture of SOM 'main subnetwork' is presented in Figure 4. We see that the 
SOM field evolved in two main branches. From CARPENTER- 1987 the strongest (main path) 
arc is leading to the right branch that after some steps disappears. The left, more vital branch 
is detected by the CPM path. Further investigation of this is left to the readers with additional 
knowledge about the SOM field. 

As a complementary information we can determine Kleinberg's hubs and authorities vertex 
weights [17]. Papers that are cited by many other papers are called authorities; papers that cite 
many other documents are called hubs. Good authorities are those that are cited by good hubs 
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and good hubs cite good authorities. The 15 highest ranked hubs and authorities are presented in 
Table 2. We see that the main authorities are located in eighties and the main hubs in nineties. 
Note that, since we are using the relation uRv = u is cited by v, we have to interchange the 
roles of hubs and authorities produced by Pa jek. 

An elaboration of the hubs and authorities approach to the analysis of citation networks 
complemented with visualization can be found in Brandes and Willhalm (2002) [8]. 

7 Second Example: US patents 

The network of US patents from 1963 to 1999 [21] is an example of very large network 
(3774768 vertices and 16522438 arcs) that, using some special options in Pa jek, can still 
be analyzed on PC with at least 1 G memory. The SPC weights are determined in a range of 1 
minute. This shows that the proposed approach can be used also for very large networks. 

The obtained main path and CPM path are presented in Figure 5. Collecting from the United 
States Patent and Trademark Office [22] the basic data about the patents from both paths, see 
Table 3-6, we see that they deal with 'liquid crystal displays'. 

But, in this network there should be thousands of 'themes'. How to identify them? Using 
the arc weights we can define a theme as a connected small subnetwork of size in the interval k 
.. K (for example, between k = and K = 3h) with stronger internal cohesion relatively to 
its neighborhood. 

To find such subnetworks we use again the arc-cuts. We select a treshold t and delete all 
arcs with weight lower than t. In the so reduced network we determine (weakly) connected 
components. The components of size in range k..K, we call them (k, K)-islands, represent the 
themes since: 

• they are connected and of selected size, 

• all arcs linking them to their outside neighbors have weight lower than t, and 

• each vertex of an island is linked with some other vertex in the same island with an arc 
with a weight at least t. 

We discard components of size smaller than k as 'noninteresting'. 

The components of size larger then K are too large. They contain several themes. To iden- 
tify them we repeat the procedure on the network of these components with a higher threshold 
value t'. Recently we developed an algorithm, named Islands [7], that by 'continuosly' chang- 
ing the threshold identifies all maximal (k, i^)-islands. 

We determined for SPC weights all (2,90)-islands in the US Patents network. The re- 
duced network of islands has 470137 vertices, 307472 arcs and for different k: C 2 =187610, 
C 5 =8859,C 30 =101, C 50 =30 islands. The detailed island size frequency distribution is given 
in Table 7 and presented in a log-log scale in Figure 6 that shows that it obeys the power law. 

The main island has 90 vertices and contains middle parts of the main path and the CPM 
path. They also have a short common part. Again, the greedy strategy of the main path leads 
to a less vital branch. Considering the basic data about the patents from Table 3-5, we see that 
also the main island deals with 'liquid crystal displays'. 
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Figure 5: Main path and CPM path subnetwork of Patents 
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Table 3: Patents on the liquid-crystal display 



date 

Mar 13, 1951 



author(s) and title 



patent 
2544659 

2682562 
3322485 

3512876 
3636168 
3666948 

3675987 
3691755 
3697150 



3731986 

3740717 
3767289 



3773747 
3795436 

3796479 



3806230 

3809458 
3872140 

3876286 
3881806 
3891307 



3947375 
3954653 

3960752 
3975286 

4000084 

4011173 

4013582 

4017416 



Dreyer. Dichroic light-polarizing sheet and the like and the 
formation and use thereof 
Wender, et al. Reduction of aromatic carbinols 
Williams. Electro-optical elements utilazing an organic 
nematic compound 

Marks. Dipolar electro-optic structures 
Josephson. Preparation of polynuclear aromatic compounds 
Mechlowitz, et al. Liquid crystal termal imaging system 
having an undisturbed image on a disturbed background 
Rafuse. Liquid crystal compositions and devices 
Girard. Clock with digital display 

Wysochi. Electro-optic systems in which an electrophoretic- 

like or dipolar material is dispersed throughout a liquid 

crystal to reduce the turn-off time 

Fergason. Display devices utilizing liquid crystal light 

modulation 

Huener, et al. Liquid crystal display 

Aviram, et al. Class of stable trans-stilbene compounds, 

some displaying nematic mesophases at or near room 

temperature and others in a range up to 100°C 

Steinstrasser. Substituted azoxy benzene compounds 

Boiler, et al. Nematogenic material which exhibit the Kerr 

effect at isotropic temperatures 

Helfrich, et al. Electro-optical light-modulation cell 

utilizing a nematogenic material which exhibits the Kerr 

effect at isotropic temperatures 

Haas. Liquid crystal imaging system having optical storage 
capabilities 

Huener, et al. Liquid crystal display 

Klanderman, et al. Liquid crystalline compositions and 

method 

Deutscher, et al. Use of nematic liquid crystalline substances 
Suzuki. Electro-optical display device 
Tsukamoto, et al. Phase control of the voltages applied to 
opposite electrodes for a cholesteric to nematic phase 
transition display 

Gray, et al. Liquid crystal materials and devices 

Yamazaki. Liquid crystal composition having high dielectric 

anisotropy and display device incorporating same 

Klanderman, et al. Liquid crystal compositions 

Oh. Low voltage actuated field effect liquid crystals 

compositions and method of synthesis 

Hsieh, et al. Liquid crystal mixtures for electro-optical 

display devices 

Steinstrasser. Modified nematic mixtures with 
positive dielectric anisotropy 

Gavrilovic. Liquid crystal compounds and electro-optic 
devices incorporating them 

Inukai, et al. P-cyanophenyl 4-alkyl-4'-biphenylcarboxylate, 
method for preparing same and liquid crystal compositions 
using same 



Jun 29 
May 30 

May 19 
Jan 18 
May 30 

Jul 11 
Sep 19 
Oct 10 



May 8 

Jun 19 

Oct 23 



Nov 20 
Mar 5 

Mar 12 



Apr 23 

May 7 
Mar 18 

Apr 8 
May 6 
Jun 24 



Mar 30 
May 4 

Jun 1 
Aug 17 

Dec 28 

Mar 8 

Mar 22 

Apr 12 



1954 
1967 

1970 
1972 
1972 

1972 
1972 
1972 



1973 

1973 
1973 



1973 
1974 

1974 



1974 

1974 
1975 

1975 
1975 
1975 



1976 
1976 

1976 
1976 

1976 

1977 

1977 

1977 
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Table 4: Patents on the liquid-crystal display 



date 



author(s) and title 



patent 



4029595 

4032470 
4077260 

4082428 
4083797 
4113647 
4118335 
4130502 
4149413 

4154697 

4195916 
4198130 
4202791 
4229315 
4261652 

4290905 
4293434 
4302352 

4330426 

4340498 
4349452 
4357078 



4361494 
4368135 

4386007 
4387038 

4387039 

4400293 
4415470 



4419263 

4422951 
4455443 
4456712 
4460770 
4472293 



Jun 14, 1977 

Jun 28, 1977 
Mar 7, 1978 

Apr 4, 1978 
Apr 11, 1978 
Sep 12, 1978 

Oct 3, 1978 
Dec 19, 1978 
Apr 17, 1979 

May 15, 1979 

Apr 1, 1980 
Apr 15, 1980 
May 13, 1980 
Oct 21, 1980 
Apr 14, 1981 

Sep 22, 1981 
Oct 6, 1981 
Nov 24, 1981 

May 18, 1982 

Jul 20, 1982 
Sep 14, 1982 
Nov 2, 1982 



Nov 30, 1982 
Jan 11, 1983 

May 31, 1983 
Jun 7, 1983 

Jun 7, 1983 

Aug 23, 1983 
Nov 15, 1983 



Dec 6, 1983 

Dec 27, 1983 
Jun 19, 1984 
Jun 26, 1984 
Jul 17, 1984 
Sep 18, 1984 



Ross, et al. Novel liquid crystal compounds and electro-optic 
devices incorporating them 
Bloom, et al. Electro-optic device 

Gray, et al. Optically active cyano-biphenyl compounds and 

liquid crystal materials containing them 

Hsu. Liquid crystal composition and method 

Oh. Nematic liquid crystal compositions 

Coates, et al. Liquid crystalline materials 

Krause, et al. Liquid crystalline materials of reduced viscosity 

Eidenschink, et al. Liquid crystalline cyclohexane derivatives 

Gray, et al. Optically active liquid crystal mixtures and 

liquid crystal devices containing them 

Eidenschink, et al. Liquid crystalline hexahydroterphenyl 

derivatives 

Coates, et al. Liquid crystal compounds 

Boiler, et al. Liquid crystal mixtures 

Sato, et al. Nematic liquid crystalline materials 

Krause, et al. Liquid crystalline cyclohexane derivatives 

Gray, et al. Liquid crystal compounds and materials and 

devices containing them 

Kanbe. Ester compound 

Deutscher, et al. Liquid crystal compounds 

Eidenschink, et al. Fluorophenylcyclohexanes, the preparation 

thereof and their use as components of liquid crystal dielectrics 

Eidenschink, et al. Cyclohexylbiphenyls, their preparation and 

use in dielectrics and electrooptical display elements 

Sugimori. Halogenated ester derivatives 

Osman, et al. Cyclohexylcyclohexanoates 

Carr, et al. Liquid crystal compounds containing an alicyclic 

ring and exhibiting a low dielectric anisotropy and liquid 

crystal materials and devices incorporating such compounds 

Osman, et al. Anisotropic cyclohexyl cyclohexylmethyl ethers 

Osman. Anisotropic compounds with negative or positive 

DC-anisotropy and low optical anisotropy 

Krause, et al. Liquid crystalline naphthalene derivatives 

Fukui, et al. 4-(Trans-4'-alkylcyclohexyl) benzoic acid 

4"'-cyano-4"-biphenylyl esters 

Sugimori, et al. Trans-4-(trans-4'-alkylcyclohexyl)-cyclohexane 
carboxylic acid 4"'-cyanobiphenyl ester 
Romer, et al. Liquid crystalline cyclohexylphenyl derivatives 
Eidenschink, et al. Liquid crystalline fluorine-containing 
cyclohexylbiphenyls and dielectrics and electro-optical display 
elements based thereon 

Praefcke, et al. Liquid crystalline cyclohexylcarbonitrile 
derivatives 

Sugimori, et al. Liquid crystal benzene derivatives 
Takatsu, et al. Nematic halogen Compound 
Christie, et al. Bismaleimide triazine composition 
Petrzilka, et al. Liquid crystal mixture 

Sugimori, et al. High temperature liquid crystal substances of 
four rings and liquid crystal compositions containing the same 
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Table 5: Patents on the liquid-crystal display 



date 

Sep 18, 1984 
Oct 30, 1984 
Mar 5, 1985 

Apr 9, 1985 
Apr 30, 1985 



author(s) and title 



patent 
4472592 
4480117 
4502974 

4510069 
4514044 

4526704 
4550981 
4558151 
4583826 
4621901 
4630896 
4657695 
4659502 
4695131 

4704227 
4709030 
4710315 

4713197 
4719032 
4721367 
4752414 
4770503 
4795579 



4797228 

4820839 
4832462 
4877547 
4957349 



5016988 

5016989 

5122295 
5124824 



5171469 
5175638 



Takatsu, et al. Nematic liquid crystalline compounds 
Takatsu, et al. Nematic liquid crystalline compounds 
Sugimori, et al. High temperature liquid-crystalline ester 
compounds 

Eidenschink, et al. Cyclohexane derivatives 

Gunjima, et al. l-(Trans-4-alkylcyclohexyl)-2-(trans-4'-(p-sub 

stituted phenyl) cyclohexyl)ethane and liquid crystal mixture 

Petrzilka, et al. Multiring liquid crystal esters 

Petrzilka, et al. Liquid crystalline esters and mixtures 

Takatsu, et al. Nematic liquid crystalline compounds 

Petrzilka, et al. Phenylethanes 

Petrzilka, et al. Novel liquid crystal mixtures 

Petrzilka, et al. Benzonitriles 

Saito, et al. Substituted pyridazines 

Fearon, et al. Ethane derivatives 

Balkwill, et al. Disubstituted ethanes and their use in liquid 

crystal materials and devices 

Krause, et al. Liquid crystal compounds 

Petrzilka, et al. Novel liquid crystal mixtures 

Schad, et al. Anisotropic compounds and liquid crystal 

mixtures therewith 

Eidenschink, et al. Nitrogen-containing heterocyclic compounds 
Wachtler, et al. Cyclohexane derivatives 
Yoshinaga, et al. Liquid crystal device 

Eidenschink, et al. Nitrogen-containing heterocyclic compounds 

Buchecker, et al. Liquid crystalline compounds 

Vauchier, et al. 2,2'-difluoro-4-alkoxy-4'-hydroxydiphenyls and 

their derivatives, their production process and 

their use in liquid crystal display devices 

Goto, et al. Cyclohexane derivative and liquid crystal 

composition containing same 

Krause, et al. Nitrogen-containing heterocyclic esters 

Clark, et al. Liquid crystal devices 

Weber, et al. Liquid crystal display element 

Clerc, et al. Active matrix screen for the color display of 

television pictures, control system and process for producing 

said screen 

Iimura. Liquid crystal display device with a birefringent 
compensator 

Okada. Liquid crystal element with improved contrast and 
brightness 

Weber, et al. Matrix liquid crystal display 

Kozaki, et al. Liquid crystal display device comprising a 

retardation compensation layer having a maximum principal 

refractive index in the thickness direction 

Hittich, et al. Liquid-crystal matrix display 

Kanemoto, et al. ECB type liquid crystal display device having 

birefringent layer with equal refractive indexes in the thickness 

and plane directions 



1985 
1985 



Jul 2 

Nov 5 
Dec 10, 1985 
Apr 22, 1986 
Nov 11, 1986 
Dec 23, 
Apr 14, 
Apr 21, 



1986 
1987 
1987 



Sep 22, 1987 

Nov 3, 1987 
Nov 24, 1987 
Dec 1, 1987 

Dec 15, 1987 
Jan 12, 1988 
Jan 26, 1988 
Jun21, 1988 
Sep 13, 1988 
Jan 3, 1989 



Jan 10, 1989 

Apr 11, 1989 
May 23, 1989 
Oct 31, 1989 
Sep 18, 1990 



May 21, 1991 

May 21, 1991 

Jun 16, 1992 
Jun 23, 1992 



Dec 15, 1992 
Dec 29, 1992 
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Table 6: Patents on the liquid-crystal display 



date 
Sep 7, 1993 

Feb 1, 1994 

May 3, 1994 
June 7, 1994 



Dec 20, 1994 
Apr 18, 1995 

Jul 23, 1996 
Aug 6, 1996 
Sep 10, 1996 

Nov 4, 1997 
Jun 23, 1998 

Jan 5, 1999 

Nov 23, 1999 

Dec 21, 1999 



author(s) and title 



patent 
5243451 

5283677 

5308538 
5319478 



5374374 
5408346 

5539578 
5543077 
5555116 

5683624 
5771124 

5855814 

5991084 

6005720 



Kanemoto, et al. DAP type liquid crystal device with cholesteric 
liquid crystal birefringent layer 

Sagawa, et al. Liquid crystal display with ground regions 

between terminal groups 

Weber, et al. Supertwist liquid-crystal display 

Funfschilling, et al. Light control systems with a circular polarizer 

and a twisted nematic liquid crystal having a minimum path 

difference of .lambda./2 

Weber, et al. Supertwist liquid-crystal display 

Trissel, et al. Optical collimating device employing cholesteric 

liquid crystal and a non-transmissive reflector 

Togino, et al. Image display apparatus 

Rieger, et al. Nematic liquid-crystal composition 

Ishikawa, et al. Liquid crystal display having adjacent 

electrode terminals set equal in length 

Sekiguchi, et al. Liquid crystal composition 

Kintz, et al. Compact display system with two stage magnification 

and immersed beam splitter 

Matsui, et al. Liquid crystal compositions and liquid crystal 
display elements 

Hildebrand, et al. Compact compound magnified virtual image 
display with a reflective/transmissive optic 

Watters, et al. Reflective micro-display system 



Table 7: Island size frequency distribution 



[1] 
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Figure 6: Island size frequency distribution 
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Table 8: Some patents from the 'foam' island 



patent 


date 


author(s) and title 


4060439 


Nov 29, 


1977 


Rosemund, et al. Polyurethane foam composition and method of 
making same 


4292369 


Sep 29, 


1981 


Ohashi, et al. Fireproof laminates 


4357430 


Nov 2, 


1982 


VanCleve. Polymer/polyols, methods for making same and 
polyurethanes based thereon 


4459334 


Jul 10, 


1984 


Blanpied, et al. Composite building panel 


4496625 


Jan 29, 


1985 


Snider , et al. Alkoxylated aromatic amine-aromatic polyester 
polyol blend and polyisocyanurate foam therefrom 


4544679 


Oct 1, 


1985 


Tideswell, et al. Polyol blend and polyisocyanurate foam 
produced therefrom 


4714717 


Dec 22, 


1987 


Londrigan, et al. Polyester polyols modified by low molecular 
weight glycols and cellular foams therefrom 


4927863 


May 22, 


1990 


Bartlett, et al. Process for producing closed-cell polyurethane 
foam compositions expanded with mixtures of blowing agents 


4996242 


Feb 26, 


1991 


Lin. Polyurethane foams manufactured with mixed 
gas/liquid blowing agents 


5169873 


Dec 8, 


1992 


Behme, et al. Process for the manufacture of foams with the aid 
of blowing agents containing fluoroalkanes and fluorinated 
ethers, and foams obtained by this process 


5187206 


Feb 16, 


1993 


Volkert, et al. Production of cellular plastics by the 
polyisocyanate polyaddition process, and low-boiling, 
fluorinated or perfluorinated, tertiary alkylamines 
as blowing agent-containing emulsions for this purpose 


5308881 


May 3, 


1994 


Londrigan, et al. Surfactant for polyisocyanurate foams 
made with alternative blowing agents 


5558810 


Sep 24, 


1996 


Minor, et al. Pentafluoropropane compositions 



22 



Table 9: Some patents from 'fiber optics and bags' island 



patent 


date 


author(s) and title 


A A f, 1 CIA 

4401 jjo 


T 1 A 

Jul 24, 


1984 


Shaw, et al. Fiber coupler displacement transducer 


A C 1 1 COO 

4M 1 joz 


A 1 s~ 

Apr 16, 


1 no c 

1985 


Bair. Phenanthrene derivatives 


/I C2 AOAA 

4dJUoUU 


T 1 O O 

Jul 23, 


1985 


Bair. Perylene derivatives 


43oy /zo 


May zU, 




Dyott, et al. Optical fiber polarizer 


40 /OJ / o 


Jun 30, 


1 A O "7 

1987 


Baxley, et al. Bag pack 


/ini HA/1 1 

4/1 yu4 / 


Jan 12, 


1988 


Bair. Anthracene derivatives 


AHQ A A C2 

4 /o44jj 


"NT 1 C 

Nov 15, 


1 Ann 

1988 


Shaw, et al. Backward-flow ladder architecture and method 


/nocnio 

4 /ojyjo 


Nov zz, 


1 noo 


Benoit, Jr., et al. Thermoplastic bag pack 


4o1UUjz 


Mar 7, 


1989 


Fling. Fiber optic bidirectional data bus tap 


AO 1 1 A 1 H 

4ol 141 / 


iviar /, 


1 080 


Prince, et al. Handled bag with supporting slits in handle 


4890000 

M-OZS'WS'U 


May 9, 


1989 


Ddii. v^mysciic uciivdiivcs 


4981216 


Jan l, 


1991 


Wilfong, Jr. Easy opening bag pack and supporting rack 








system and fabricating method 


4997249 


Mar 5, 


1991 


Berry, et al. Variable weight fiber optic transversal filter 


5188235 


Feb 23, 


1993 


Pierce, et al. Bag pack 


5307935 


May 3, 


1994 


Kemanjian. Packs of self opening plastic bags and method of 








fabricating the same 


5363965 


Nov 15, 


1994 


Nguyen. Self-opening thermoplastic bag system 



For additional illustration of results obtained by Islands algorithm we selected two smaller 
islands at lower levels - see Figure 8 (50 vertices) and Figure 9 (38 vertices). Retreiving the 
basic data about some patents in these islands from United States Patent and Trademark Office, 
see Table 8 and Table 9, we can label the corresponding theme of the first island as 'producing 
a foam'. The theme of the second island deals initially with 'fiber optics', but in the upper part 
it switches to 'bag pack system' . 

8 Conclusions 

In the paper we proposed an approach to the analysis of citation networks that can be used also 
for very large networks with millions of vertices and arcs. 

On test cases, the methods SPC, SPLC, NPPC produced almost the same results. Since the 
method SPC has additional 'nice' properties it could be considered as a 'first choice' - but, to 
make a grounded recommendation, additional experiences should be gained from the analyses 
of real-life large citation networks. 

The granularity of the results strongly depends on the range for 'interesting themes' k .. K 
- varying these two parameters we get larger or smaller sets of themes. 

Instead of arc-cuts we could consider also vertex-cuts with respect to p-cores on SPC 
weights [6] with a p-function 

p{y , W) = max( w(u,v), w(v,u)) 

u&¥:uRv u£W:vRu 



24 



The subnetworks approach only filters out the structurally important subnetworks thus pro- 
viding a researcher with a smaller manageable structures which can be further analyzed using 
more sophisticated and/or substantial methods. 
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